Skip to content

[CT] Fix CT Config to honor fp8_inc KV cache dtype#929

Merged
xuechendi merged 3 commits intovllm-project:mainfrom
yiliu30:fix-llmc-kv
Feb 5, 2026
Merged

[CT] Fix CT Config to honor fp8_inc KV cache dtype#929
xuechendi merged 3 commits intovllm-project:mainfrom
yiliu30:fix-llmc-kv

Conversation

@yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Feb 4, 2026

Adapt the update in vllm-project/vllm#30141

        # llm-compressor mdls need to set cache_dtype to "fp8" manually.
        if getattr(quant_config, "kv_cache_scheme", None) is not None:
            kv_cache_dtype = "fp8"
            calculate_kv_scales = False
            if cache_config is not None:
                cache_config.cache_dtype = "fp8"
                cache_config.calculate_kv_scales = False

        self.kv_cache_torch_dtype = kv_cache_dtype_str_to_dtype(
            kv_cache_dtype, vllm_config.model_config
        )
        self.kv_cache_dtype = kv_cache_dtype

cc @hshen14 @thuang6 @lkk12014402

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a configuration issue in the Compressed Tensors implementation for HPU (Habana Processing Unit) to properly handle the fp8_inc KV cache dtype instead of the default fp8 format.

Changes:

  • Added a custom __init__ method to HPUCompressedTensorsConfig that overrides KV cache settings after parent initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@xuechendi xuechendi merged commit 175572b into vllm-project:main Feb 5, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants